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XEROX 



Pointer manipulation is tricky. A source of irritation is that a 
programmer occasionally finds himself one step further down a list than he 
would like to be. Another is having to fiddle at the beginning or end of a 
structure or treat the empty structure as a special case. The situation 
can be ameliorated by taking the CPL view of data structures [S^P]. I was 
exposed to this view several years ago ^ but only recently came to 
appreciate it. 

The most common and obvious method of altering a structure is to change the 
component of a node (e.g. rplacd in LISP) which interpreted graphically 
amounts to sv/inging a pointer; i.e. moving its arrowed end. The CPL method 
is to overwrite the entire node. Graphically this amounts to moving the 
unarrowed end(s) of one or more pointers at once. Figure 1 illustrates 
these two kinds of transformation. 

It is easy to simulate pointer swinging by node overwriting: 

rplacd[x;z] = overwrite x with cons[car[x]; z] 

It is not so easy to reverse the simulation because the overwrite scheme 
allows one to change the amount of information in a node. By implicatioa^ 
node overwriting is more expensive to implement, either in terms of 
space-time or complexity. 



To: MPS Group 

From: J. Morris 

Subject: Pointer Swinging vs* Node Overwriting 

page 2 



Figure 1. Two kinds of structure change. 



3 1^. 



4 Pr- 



rplacdfxjz) 



5 ♦- 




(a) Pointer Swing 



3 


Mk. 


y 


4 








z 




rplacaCy,car[zl)j 














5 


••••.. 


rplacd [y,cdr [zj 3 




















X 




y 


5 e 










^ 












\ 




z 


«; ft 


\ 









(b) Node Overwrite 

In principle node over\^iting is supported by any language with union types 
and reference variables or their equivalents; e.g. AIiGOL-68^ PASCAL [vW^W], 
I shall \ise PASCAL to illustrate it* 

Suppose one wishes to implement lists of integers. He makes the 
declaration 

type list = i record hd: integer; tl: list end 

which says that a value of type list is a pointer to a record consisting of 
an integer and a list. The constant nil is implicitly a pointer of any 
type and is used to represent the empty list. 



If X is declared a list ^ by 



the value of 



var x:list 



Xf 



is its contents^ a record^ and the values of 
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xf.hd and xl-.tl 

are the respective components of the record. Thus getting the tail of a 
list is a two step process: taking the contents of a pointer and selecting 
a component of the contents. 

There are basically two kinds of assignment 

x:=y 
changes the value of x^ 

xi:=z 
changes the contents of the pointer x. An assignment like 

E.hd :=3 

should be regarded as an abbreviation for 

E := <3/ E.tl> 

whatever E happens to be. E.g. 

xt .hd :=3 

changes the contents of the pointer x and happens to leave its tl 
unchanged. 

The representation chosen here for lists uses the pointer swinging 
strategy. It induces the irritations discussed at the beginning^ as the 
following example illustrates. 

Suppose one wishes to delete all the odd numbers from a list 1. In this 
representation a deletion must be accoraplished by changing the tl of the 
preceding element. Thus one must hang on to the element preceding the one 
whose hd he is examining. To make matters worse ^ if the element is the 
first one on the list^ the deletion must be done by a simple assignment to 
1. These facts contribute to the opacity of the program: 

L: if l=nil then goto End; 

if-niodd (If Thd) then goto M; 
1 : =li.tl; got o L; 
M: x:=l; 

while X i.tl^nil do 

if odd(xt.tlt .hd) 

then xr.tl := xt.tl^.tl 
else x:=xt .tl 
End: 
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The reader is invited to simplify the program; his taste may suggest using 
two variables to scan the list^ using LISP or ALGOL- W notation to avoid all 
the "^.''^Sr or eliminating the goto's. It's still pretty bad* (A referee 
who rewrote it to eliminate goto * s introduced a bug.) 

The cure for the problems is to adopt an "unobvious" representation for 
lists using the node overwrite strategy. 

Statically the change seems quite minor: a list becomes a pointer to a 
union type half of which is an empty indicator. PASCAL'S way of saying 
this is 

type list-j record case empty: Boolean of 

true : ; 

false: (hd: integer ;tl: list) 
end 

The value of 

xi .empty 

will tell one if x is empty. 

The dynamics of the situation are quite different. To change a structure 
one usually overwrites the entire contents of a pointer; e.g. ^ 

xi := xf .tlf 

removes xf.hd from a list by changing both xt.hd and xt.tl. 

Now the program to delete odd numbers from 1 is reasonable. 

x:=l; 

while -^xf .empty do 
if odd(xt.hd) 

then xt:=X't.tli 
else x:=xi.tl 

Lest the reader suspect this example was cooked^ several more are given in 
an appendix. 
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Figure 2. Two methods of list representation 
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(b) node overwrite 
Recommendations for Lanqu age Design and Implementati on 

The language should allow people to use the node overv/ri ting strategy and 
should not penalize them excessively by iinplementing structures naively* 
Although PASCAL and ALGOL- 68 allov/ it I suspect their implementors 
discourage it* 



An ,,^ Implementation 

The most straightforward implementa.tion (used in CPL) uses extra pointers 
which the user cannot access directly. For example the list 1= {1r2r3) is 
represented by 

a3 a4 a5 a6 a7 



al 



a2 



cH — ^ 



The addresses of the smaller boxes represent pointer values; assignments 
through pointers change the contents of these boxes* There is nothing the 
user can say to change the contents of the larger boxes. In terms of 
PASCAL notation 
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l=a1 

lt=a2 

lt.tl=a3 

ll.tlf=a4 

etc. 
However^ recall that 

If.tl :== p 

means It := <lf.hd,p> 

which changes the contents of al , not a2. 

The reader's reaction to this description is likely to be what mine was: 
"Hiding all those pointers from the user is a bad thing." Considering how 
long it took me to reject that view ^ I doubt anything I say can decisively 
refute it. My only suggestion is that he try to v/rite some of the example 
programs using the pointer swinging strategy ^ and then multiply the hassle 
he experiences by the number of programmers who will write similar 
programs . 
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Appendix: Further examples of node overwrite programs 



Each of the examples is done using the node overwrite strategy. I found 
the pointer swinging versions troublesome. 

(a) List insertion. 

Using the node overwrite definition of list, insert i in the ordered list 
1. 

procedure insert (i:integer; l:list); 
yar nrx:list; 
begin x:=l; 

whi le -»xt. empty & i<xt.hd do x:=xt.tl; 
new (n) ; {allocate a new node} 
ni:=xi; 

xf.hd:=i; xt.tl:=n; xi. empty: -false 
end 

(PASCAL'S syntax would be improved if one could replace the last line, 
by something like 

xt:=<irn>) 

(b) Tree insertion. 

Given 

type tree=^f rec ord case empty: Boolean of 

true: ; 

false: (data: integer; l^r: tree) 
end 

write a procedure to insert into a tree so that post-order scan orders the 
numbers. 
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procedure insert (i: integer ;t:tree) ; 
var x:tree; 
begin x:=t; 

while -txt. empty do 

x:=if i<xf.data 

t hen xf . 1 
els e xt . r ; 
xf .erripty:=false; 
xt.data:=i; 

new (xt.l); xt .It .empty: =true; 
new (xf.r) ; xt .rt. empty :=t rue 
end 

This example illustrates a potentially disastrous waste of space caused by 
the node overwrite strategy. The leaves of the tree are always empty yet 
must be big enough to hold an integer and two pointers; thus a tree 
requires tv/ice as much space as it should. A minor re-design of PASCAL 
might allow the implementor to be clever and materialize empty nodes only 
when there are multiple references to them. 

(c) Radix Sort. 

procedure sort (nrlist) ; 

var f fl: array[ 0. .9 ] of list; 

Cyt: integer; 
{assume all the numbers are <100000) 
b egin fo r t: = to 9 do new (f[t]) ; 
c : ~ 1 • 
whi le c<1 00000 do 

begin for t: = to 9 do l[t]:=f£t]; 
while -^nt. empty do 
beg in t : =n f . hd/c mod10; 
l[t]t:=nf ; 
l[t]:=nt .tl; 
n:= nt .tl 
end; 

for t: = 9 dgwntg do 
begin l[t]t:=nf ; nt :=f[t}f end 
end 
end 

The pointer sv/inging approach v/ill require one to v7orry about empty lists; 
here one only has to be sure to concatenate from back to front. 

Another apparent expense of node overwriting is brought out by this 
example. Suppose the hds of lists were 80 character arrays. Then 
assignments like l[t]i:=nf might involve many memory references. The 
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implementor can ameliorate things by using pointers behind the scenes. He 
should resist the temptation to allow the user to swing these pointers* 

(d) Two-way lists. 

Node overwriting seems inappropriate for two-way lists. The same 
declaration as for tree will suffice for nodes on two-way lists. To delete 
a node x from its list one coul d say 

t:= xt.r; 
xf:= xt.lf; 
xf.r:=t 

but that seems strange and wouldn't work for two node circular lists* A 
pointer swinging change 

xi .It .r:=xi .r; 
xf .rt .1:= xf .1 

seems better. 

(e) Expression evaluation. 

Suppose arithmetic expressions are represented according to 

t ype exp=t record case op:etype of 

const: (val: integer) ; 
sum: (l^r: exp) ; 
end 

The following procedure evaluates the expression^ avoiding re-evaluation of 
shared sub-expressions. 

p rocedu re eval(e:exp); 
var t:integer; 
begi n if et . etype=sum the n 
begin eval(et.l); 
eval (et.ry ; 
ei . etype: -const ; 
e i . val : =e t . r t . val +e t . 1 t . val 
end 
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The pointer swinging version of this program would involve assignments like 
el.l:=v and et*r:=v. Aside from being clumsier it would have to perform 
three additions instead of two on a structure like 
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